Olfactory learning alters navigation strategies and behavioral variability in C. elegans

Animals adjust their behavioral response to sensory input adaptively depending on past experiences. The flexible brain computation is crucial for survival and is of great interest in neuroscience. The nematode C. elegans modulates its navigation behavior depending on the association of odor butanone with food (appetitive training) or starvation (aversive training), and will then climb up the butanone gradient or ignore it, respectively. However, the exact change in navigation strategy in response to learning is still unknown. Here we study the learned odor navigation in worms by combining precise experimental measurement and a novel descriptive model of navigation. Our model consists of two known navigation strategies in worms: biased random walk and weathervaning. We infer weights on these strategies by applying the model to worm navigation trajectories and the exact odor concentration it experiences. Compared to naive worms, appetitive trained worms up-regulate the biased random walk strategy, and aversive trained worms down-regulate the weathervaning strategy. The statistical model provides prediction with $>90 \%$ accuracy of the past training condition given navigation data, which outperforms the classical chemotaxis metric. We find that the behavioral variability is altered by learning, such that worms are less variable after training compared to naive ones. The model further predicts the learning-dependent response and variability under optogenetic perturbation of the olfactory neuron AWC$^\mathrm{ON}$. Lastly, we investigate neural circuits downstream from AWC$^\mathrm{ON}$ that are differentially recruited for learned odor-guided navigation. Together, we provide a new paradigm to quantify flexible navigation algorithms and pinpoint the underlying neural substrates.

Learned olfactory navigation is a powerful platform for studying how a brain implements associative learning of complex behaviors.However, the quantitative change in sensorimotor transformation and the underlying neural circuit substrates to generate learned goal-directed navigation are still unclear.Here we investigate learning-dependent sensorimotor processing and the neural basis for navigation in the nematode Caenorhabditis elegans by measuring, modeling, and perturbing learned odor-guided navigation.We develop a novel statistical model to characterize how the worm employs two behavioral strategies: a biased random walk and weathervaning.We infer weights on these strategies and characterize sensorimotor kernels that govern them by fitting our model to the worm's time-varying navigation trajectories and precise odor concentration measurements.Compared to naive worms, appetitive trained worms up-regulate their biased random walk strategy, while aversive trained worms down-regulate their weathervaning strategy.The model predicts an animal's past learned experience with > 90% accuracy given finite observations, outperforming a classical chemotaxis metric.The model trained on natural odors further predicts the animals' learning-dependent response to optogenetically induced odor perception.Our measurements and model show that behavioral variability is altered by learning-trained worms exhibit less variable navigation than naive ones.Genetically disrupting individual interneuron classes downstream of an odorsensing neuron reveals that learned navigation strategies are distributed in the network.Together, we present a flexible navigation algorithm that is supported by distributed neural computation in a compact brain.L earning is a fundamental property of neural systems that allows an animal to flexibly alter behavior based on experience.Learned olfactory navigation provides an ideal framework for studying learning because it is a naturalistic behavior, common to species across scales (1,2), ethologically relevant for seeking food (3,4) and avoiding pathogens (5,6), and can be rapidly learned in very few trials (4,7,8).Careful characterization of learned navigation behavior in a naturalistic context can shed light on the flexible computation performed by the brain (9).
To study how animals flexibly alter their olfactory navigation upon learning, we focus on the nematode worm C. elegans.This worm has a compact nervous system (10), wellcharacterized olfactory neural circuits (11)(12)(13)(14)(15) and navigation behavior that has been studied in detail (11,16).Worms learn to associate an odor with the presence or absence of food and then will either navigate towards higher concentrations of the odor or will ignore the odor, respectively (3,17).However, it is unknown how the worms' navigation strategies are altered by learning to achieve these feats.We aim to quantitatively characterize the learned navigation strategy and their under-lying sensory transformations to constrain neural mechanisms of learned navigation.
The worm navigates in sensory environments mainly through two strategies: klinotaxis and klinokinesis (16,18,19).Klinotaxis is a process in which the worm continuously modulates its heading to align with the local gradient, also known as "weathervaning" (18).Klinokinesis is a biased random walk (20,21) in which the worm produces sharp turns called "pirouettes" with a probability that depends on the animal's estimate of the local gradient of a sensory cue (16,20,22).In both cases the animal estimates information about the local gradient by comparing measurements of the stimuli across time.Both strategies contribute to sensory-guided navigation in landscapes of temperature (19), salt (18,23,24), and certain odors (11,18,25,26).
Olfactory navigation has significant relevance in studying learning because there are odors for which the animal learns to alter its "valence" (positive or negative) with respect to the odor-similar to associative learning in many other animals.This contrasts with context-dependent salt or thermosensing in C. elegans in which the worm instead learns a preferred salt concentration or temperature set point based on past experience (19,23).For learned olfactory navigation, it is not known how the biased random walk and weathervaning strategies change upon learning, nor how the detailed quantitative transformations between sensory input and motor output are changed.

Significance Statement
Learning is a feature of species across scales.How does the brain flexibly learn to produce complex behavior?We focus on C. elegans to study how navigation strategies depend on olfactory learning by utilizing precise measurements and a novel model.The fitted model shows that learning alters two known navigation strategies in worms and the model outperforms a classical metric in decoding the animal's past learning experience.We discover that learned navigation can express itself in a context-dependent manner.In addition, through perturbing individual interneurons, we find that most neurons' contribution are distributed across strategies and are differentially altered by learning.We expect these flexible behavioral algorithms and neural computation to be generalized to other species and behavior.

D R A F T
Quantitative analysis for airborne odor-guided navigation has been limited, in part because of experimental challenges in measuring detailed information about the odor concentration experienced by the animal.While past work estimated odor cues indirectly or with models (4,18), precise concentration measurements are required to characterize sensorimotor transformation in the sensory environment.Recently, however, new experimental methods to control and monitor the odor landscape experienced by small animals, such as the worm, (27)(28)(29), now make it possible to empirically constrain quantitative models of odor-guided navigation, as we pursue here.
Worms are intrinsically attracted to butanone, a volatile organic compound found in bacterial food in the worm's natural habitat (30).When worms are exposed to butanone paired with food ("appetitive training"), they increase their attraction and are more likely to navigate towards higher butanone concentrations (3,12,17,31).In contrast, when butanone is paired with starvation ("aversive training"), worms decrease their tendency to climb up butanone gradients in comparison to worms without exposure to butanone ("naive") (3,31).The animal's neural and behavioral responses to butanone both change upon learning (12,31).Sensory neuron AWC ON responds to butanone (3,31), as well as others (12,32).However, the involvement of downstream interneurons in butanone learning and learned navigation are still unclear.
In this study we seek to answer: (1) How are the worm's navigation strategies altered by olfactory learning?(2) How are sensorimotor transformations altered by learning and how does this vary across a population?(3) What neural substrates may be involved?To answer these questions, we combine precise experimental measurements using our recently developed continuous odor monitoring assay (27) and a novel statistical model to rigorously characterize how butanone associative learning alters odor navigation strategies in worms.
Our measurements and model reveals that the animal's biased random walk is bidirectionally altered by butanone learning and that its weathervaning strategy is down-regulated upon aversive learning.Our approach yields interpretable model parameters that better decode training conditions compared to chemotaxis index, and also predict response to optogenetic perturbation in the sensory neuron AWC ON .We discover that naive worms have higher behavioral variability and demonstrate context dependent behavior.And we provide insights into the role of specific interneurons.

Results
Learning bidirectionally alters olfactory navigation.We developed a protocol to train worms to associate butanone with either food (appetitive training) or starvation (aversive training) (Figure 1a).Our protocol was similar to previously reported training regimens (12,17,31) except that ours exposes the animal to multiple rounds of odor paired with starvation instead of only one, which we found increases consistency in learning (Fig. S2).After training, we recorded the movement of populations of worms as they crawled in a defined odor landscape that used metal-oxide sensors to continuously monitor the odor concentration along the boundary (27) (Figure 1b, Fig. S1).We recorded hundreds of locomotory trajectories per plate in this odor environment after different training conditions, for up to 13 plates per training condition.Animal's locomo-tory trajectories were qualitatively different depending upon learning, with appetitive trained animals traveling up gradient more often than naive animals, and aversive-trained animals traveling up gradient least of all, broadly consistent with prior reports (3,12,31) (Figure 1c).We quantify the performance of traveling up the gradient using a "chemotaxis index" that is calculated by comparing the number of trajectories going up-gradient versus down gradient (3,31) (Figure 1d).Performance navigating up gradients is bidirectionally modulated by learning: chemotaxis index increases after appetitive training and decreases after aversive training compared to naive worms that undergo no training.Interestingly, even animals that undergo aversive training do not, on average, navigate down the gradient, suggesting that after learning an association between butanone and starvation, animal are indifferent or possibly still slightly attracted to butanone.
To explore whether worm navigation superficially resembles a biased random walk, we calculated a "run index" by computing the normalized length of a run moving up-gradient, where the run is defined as the period between pirouettes.To explore whether navigation superficially resembles a weathervaning strategy, we calculated a "turn index" that reports the normalized fraction of turn events that result in heading up-gradient (23).Our measurements show that both the run index and the turn index are on average bidirectionally modulated by learning with respect to naive animals, suggesting that learning alters both of these navigational strategies.
While the metrics above suggest hypotheses about how navigational strategies change due to learning, they provide little information about the dynamics of navigation, nor the sensorimotor transformations that govern these dynamics.Specifically, the metrics use only binary information about whether the animal is traveling up or down gradient, and ignore details of the sensory landscape like the odor concentration experienced over time.The indices also provide no information about the behavioral variability across the population.To overcome these limitations, we sought a statistical model that captures temporal information, behavioral noise, and explicitly predicts how the animal changes its movement in response to sensory stimuli.
Odor-dependent mixture model of olfactory navigation.To characterize how olfactory sensory inputs are transformed to behavior under different training conditions, we developed a dynamic Pirouette and Weathervaning (dPAW) statistical model of worm olfactory navigation.The dPAW model consists of a mixture of two navigation strategies: a pirouette behavior consisting of an abrupt change in heading angle (a turn) and weathervaning which instead continuously modulates heading angle.The dPAW model describes how the worm implements and balances these two behavioral strategies depending on time-varying sensory inputs (Figure 2a).The dPAW model is an extension to a classic biased random walk, where the run intervals are replaced with weathervaning behavior.This framework explicitly models these strategies and is fit to detailed measurements of movement and odorexperience.This is to our knowledge the first statistical model that explicitly captures the detailed changes to C. elegans navigation strategies upon butanone learning.
The model worm samples its heading change dθ at each time step from one of two distributions: either a "weathervaning distribution" Pwv(dθ) or a "pirouette distribution"  Ppr(dθ).The weathervaning distribution is narrow, reflecting small changes in heading angle that result from the worm's recent measurements of the concentration gradient.Pirouette behavior, on the other hand, corresponds to large turns that the worm makes when it receives evidence that it is going in the wrong direction.The pirouette distribution is therefore broad, with a peak at ±π, indicating a complete reversal in direction.The concentration-dependent "decision" to produce a pirouette in between runs forms a biased random walk (16,22).The worm's decision to initiate a pirouette or to continue to weathervane on each time step is modeled with a Bernoulli generalized linear model (GLM) that takes the filtered history of the odor concentration C1:t−1 and the worm's own movement history dθ1:t−1 as inputs.The output of this GLM is a binary variable βt that indicates the presence of a pirouette.Thus, the worm samples its heading change from the pirouette distribution Ppr(dθ) if βt = 1 and the weathervaning distribution Pwv(dθ) if βt = 0.The full model can be written: P (dθt|C1:t−1, dC ⊥ 1:t−1 , dθ1:t−1) = P (βt = 1)Ppr(dθ) + P (βt = 0)Pwv(dθ | dC ⊥ 1:t−1 ), [1] where

D R A F T
is the mixing probability over the two distributions, with parameters m and M for the minimum and maximum probability of a pirouette on a single time bin, and KC and K h corresponding to filters on past odor concentration C1:t−1 and the past absolute angular change |dθ1:t−1| vectors, respectively.The pirouette and weathervaning distributions are in turn given by where U is uniform in the circular heading and f is a von Mises distribution with mean and precision parameter κ.In the pirouette distribution, scalar α ∈ [0, 1] is the weight on the uniform distribution and κpr is the precision parameter that determines the sharpness of the pirouette.In the weathervaning distribution, the mean is altered according to the perpendicular concentration change dC ⊥ 1:t−1 and the precision parameter κwv determines the noise around the head angle.(Note we have made a simplification by allowing the model access to the instantaneous perpendicular odor concentration.In reality, the animal is thought to compute dC ⊥ 1:t−1 from sequential measurements in time as it swings its head through space.) We fit dPAW to measurements of animal movement in the odor arena, including the time varying headings dθt, the concentration along the locomotion path Ct, and the concentration perpendicular to the locomotion path dC ⊥ t .All model parameters in dPAW are jointly inferred through a maximumlikelihood method.To validate that parameters are reliably Shaded area shows standard deviation of the kernel estimate.inferred, we simulated example chemotaxis trajectories from pre-defined parameters, fit them by the model, and confirmed that the model accurately recovered the pre-defined parameters (Fig. S3).In the rest of the paper we fit the model to measurements to explore how navigation strategies change with learning.

Model captures navigation altered by olfactory learning.
To characterize how olfactory learning alters navigation strategies, we fit the dPAW model to trajectories measured in the butanone odor environment after different training conditions.We first confirmed that the model capture key aspects of the animal's navigation.We confirmed that the fitted model's estimate of pirouette frequency matches that measured empirically ( 16) (Figure S4b).We also used the model to simulate chemotaxis behavior in the odor environment and confirmed that model-generated trajectories have a chemotaxis index that agrees with measurement and is similarly bidirectionally modulated by training conditions (Figure 2b).Model-generated trajectories also appear visually similar to experimental observations (Figure 2c).And in further agreement, simulated trajectories recapitulate sensory and behavioral statistics measured in experiments, including distributions of the worms: heading angle, pirouette rate, experienced perpendicular odor concentration difference experienced, and tangential odor concentration (Fig. S4,5).Agreement between model and measurement was not due to chance.The dPAW model on average captures 0.3 − 0.6 bits/s more information, depending on the animal's training condition, about navigation behavior than a null model that lacks any olfactory sensing mechanisms (Figure 3c).For comparison to additional null models see Fig. S5a.
Sensorimotor kernels change upon learning.It is unknown whether sensorimotor kernels, KC and K dc ⊥ , should necessarily change upon learning (16,18,31).For example, an animal could in principle change its chemotaxis index upon learning by altering its overall pirouette rate and the curvature of its runs without having to alter the kernels that govern its sensorimotor response.To test this, we inspected the kernels inferred from dPAW that were fit on animals that underwent different training conditions.Kernels corresponding to both weathervaning and biased random walk strategies are altered by learning (Figure 2d,e).The weathervaning kernel K dC ⊥ has lower weights after aversive training compared to appetitive trained or naive animals (Figure 2d), suggesting that the animal's heading angle is less tightly dependent upon the concentration difference perpendicular to its path.In other words, aversive trained worms don't pay as much attention to perpendicular concentration when choosing their heading angle.There are two extreme ways these worms could not pay attention to the sensory cue: the worm could be uncoordinated and randomly select a heading direction, or alternatively, it could keep its existing heading.We observe the latter.We found that aversive-trained worms alter the behavioral noise and are more likely to preserve their existing heading and exhibit higher persistent length during their runs, as indicated by higher fitted precision parameter of weathervaning κwv (Figure 3b).
The kernels corresponding to pirouette decisions, KC , change for both aversive and appetitive training compared to naive animals (Figure 2e).The amplitude is increased after appetitive training compared to naive animals, suggesting that these animal's pirouette probability more strongly depends on the concentration change along the navigation path than for naive animals.a longer time delay and forms a tri-phasic shape, markedly different from the biphasic shape observed in naive animals, suggesting that aversive trained animals may not be responding as much to downward changes in concentration (Figure 2e).
Collectively, we show that C. elegans alter their chemotaxis upon learning by changing the kernels that govern their sensorimotor response.

Decision function governing pirouettes changes upon learning.
To understand how the animal alters its preference for continuing to weathervane versus interrupting weathervaning with pirouettes, we compared the pirouette decision function in equation 2 inferred from our measurements before and after learning.
We found that learning alters the input-output statistics governing the initiation of a pirouette.This is clear from inspecting the distribution of the filtered signal (Figure 3a, bottom, related to odor concentration and past behavior) that serves as input to the decision function.Appetitive training and aversive training pushes the tail of the filtered signal probability distribution in opposite directions with respect to naive condition, which changes the statistics of pirouettes.This could reflect either changes to the kernels, or changes to the environment that those animals prefer to explore.
Aversive-trained worms have higher baseline turning rate m than appetitive trained worms, as seen in the output probability (Figure 3a, right), which is the result of passing the filtered odor signal (Figure 3a, bottom) through a nonlinear de-cision function (equation 2 and Figure 3a, top).Both aversiveand appetitive-trained worms have higher maximum pirouette rate M compared to naive worms, reflecting a change to the decision function itself.
It is interesting to note that appetitive (but not aversive) trained animals spend more time with their pirouette probabilities in the most sensitive range of the sigmoid (Fig 3a right, inset), possibly indicating a more efficient strategy for chemotaxis.Capturing these details is one of the ways that dPAW is able to better detect changes in learning.
Model outperforms other metrics at decoding learned experience.The changes to sensorimotor processing detected by the dPAW model all provide information about the animal's past experience.We therefore wondered whether the model could accurately predict the training (aversive, appetitive or naive) that a population of worms had experienced.To test this we inspected dPAW models fit to measured trajectories from each training conditions γ ∈ {appetitive, naive, aversive} corresponding to fitting parameters Θγ and made maximum likelihood predictions of the training condition given held-out test trajectories: γ = arg maxγ P ( ⃗ dθ| ⃗ C, ⃗ dC ⊥ ; Θγ).On held out data, the fitted model correctly predicted training condition with a performance well above 90%, significantly above chance levels (Figure 4a), given sufficiently long recordings.
The inferred kernels and decision function within Θγ better reflect the worm's learning than either a classic chemotaxis index that captures the fraction of trajectories that go upgradient (Figure 4a), or the concentration difference along the track (Fig. S7).Indeed, dPAW always outperforms the chemotaxis index at decoding past training, for any tested amount of finite data.The model's predictive power is derived from observing both the odor-history experienced by the animal and the corresponding behavior responses.Models supplied with either odor-history or behavior but not both fail to perform as well.
Some training conditions are more challenging for dPAW to decode then others.Naive worms show the lowest predictive performance with finite data when decoding is performed for each training condition separately (Figure 4b).This is consistent with our estimate of lower information exhibited by the trajectories of naive worms than trained worms (Figure 3c).Practically, this means more measurements are required to capture navigation behavior in naive worms than in trained worms.
The same computations govern natural and optogenetic odor stimuli.We wondered whether the same underlying computations that governed the animal's response to natural odor stimuli also govern its response to optogenetic-induced sensory stimuli.Optogenetics stimuli are not bound by the same natural statistics that the worm encounters when exploring a physical odor arena.For example, worms in our arena always experience temporally correlated and slowly varying odor responses.This introduces potentially confounding temporal correlations that can be avoided by using optogenetic stimuli (31,33,34).We therefore investigated response to optogenetic induced odor sensation after learning (Supplementary Table S2) and adapted our model to incorporate both forms of stimuli.
The animal's behavioral response to optogenetic stimulation was bidirectionally altered by learning (Figure 5

S9
).We measured the animal's absolute change in heading angle |dθ| in response to optogenetic stimulation of neuron AWC ON expressing ChR2 (Figure 5b).AWC ON is a butanone sensitive neuron known to play an important role in learning (12,17,31).We measured behavior response to optogenetic stimuli two ways: we delivered pulses of optogenetic stimuli to animals experiencing odor in the arena (Figure 5, Fig. S8), and we delivered time-varying intensity white noise optogenetic stimuli to animals off odor (Fig. S9).In both cases (Figure 5e and Fig. S9) the animal's turning behavior was more tightly coupled to optogenetic stimuli for appetitive trained animals than naive worms, and less so for aversive trained animals.A change in behavior response is consistent with prior reports (31), but note that here we make a more stringent comparison by comparing both appetitive and aversive trained animals against the same naive control condition.
We extended our model so that light intensity contributes to the pirouette probability during navigation: +m [5] where kernels K odor and Kopto are weights on vectors of odor concentration C1:t−1 and light intensity I1:t−1, respectively.Since there is no difference along the perpendicular direction for this five second long spatially uniform optogenetic pulse, this model is simplified with kernels weighting the tangential concentration and neglecting the perpendicular concentration for weathervaning.Across different training conditions, the optical kernel is close to the mirror image of the odor kernel, most strikingly demonstrated in the naive and aversive conditions (Figure 5e).Therefore the sensorimotor computation inferred from freely moving animals is predictive of the response to external perturbations.The inversion is expected and can partly be explained by the biophysics of the AWC ON neuron: it hyperpolarizes when odor is present and depolarizes when odor is removed.This gives us confidence that our findings about sensorimotor processing derived from natural odor stimuli should be relevant to the larger literature based on more artificial stimulation.We therefore will leverage optogenetics to probe behavioral variability during odor navigation.
Learning modulates behavioral variability in response to sensory perturbation.We sought to characterize the variability in learned odor-guided navigation, because variability is a known feature of sensory processing in worms (35).We observe variability across collections of trajectories in learned navigation.For example, the kernel KC that governs the timing of pirouettes varies across subsamples of our data, and seems to vary more for aversive-trained than appetitive-trained or naive animals (Fig. S6).
Surprisingly, we discovered that animals respond to stimuli differently depending on whether they are traveling up or down an odor gradient and that this contributes to variability (Figure 5c,d).Naive worms respond more strongly to optogenetic impulses delivered when traveling up an odor gradient, than when traveling down the gradient (Figure 5c).Appetitive trained worms, by contrast, respond consistently to optogenetic impulses regardless of their direction of travel along the odor gradient.Aversive trained worms show weak response to optogenetic impulse regardless of the gradient context.This indicates that learning alters behavioral variability, and that a response to stimuli can be context-dependent along the navigation trajectory.

Downstream interneurons differentially contribute to learned
chemotaxis.We investigated interneurons that may be involved in implementing learned changes to navigation strategy.We focused on interneurons downstream of the odor-sensing neuron AWC ON because optogenetically induced activity in AWC ON is sufficient to recapitulate aspects of learned navigation (Figure 5b) and AWC ON is known to play an important role in odor sensing (11), navigation (36)(37)(38), and butanone learning (3,31).We selected five interneurons subtypes with direct synaptic inputs (chemical or electrical connections) from AWC ON (10, 13): AIA, AIB, AIZ, AIY, and RIA, several of which are known to be involved in navigation (18,19,23,31,37,39).For each neuron subtype we compared learned odor-guided navigation in wild type animals to that of mutants for which the neuron subtype was genetically ablated (via miniSOG, Supplementary Table S1) or down-regulated (via expressing activated potassium channel), as described in methods, Figure 6.
Ablating or down-regulating the downstream interneurons had wide-ranging and statistically significant effects on chemotaxis performance and inferred model parameter, including after learning (Figure 6b; Fig. S10, Fig. S11).Here we characterize chemotaxis performance with a weighted chemotaxis index (wCI) that differs from the chemotaxis index in Figure 1d by more heavily weighting trajectories that experience a large change in odor concentration and de-emphasizing tracks that experience little odor change (described in the methods).To capture changes in inferred model parameters we report a stimuli-normalized magnitude of the kernels KC and K dC ⊥ which correspond to the extent to which the animal uses the biased random walk (BRW) or weathervaning strategies (WV), respectively, Figure 6b.Ablation of neuron subtype AIZ was most severe and eliminated gradient climbing behavior across all three conditions (wCI close to zero), consistent with its reported role in salt chemotaxis (18).In general, though, neuron subtypes contributed differently to chemotaxis performance, learning, or had different contributions to the inferred kernels corresponding to each navigation strategies.For instance, AIA, AIB and AIY defective animals showed little difference between naive and aversive training conditions, but did increase chemotaxis performance after appetitive training.The RIA defective animals have similar chemotaxis performance in upon appetitive and aversive learning, suggesting that RIA is be involved in fine-tuning heading angle after learning experience.Interestingly, RIA defect results in much lower gradient climbing performance for naive animals.This suggests that RIA may be related to both learning and sensory adaptation in odor

navigation.
We systematically quantified each neuron's contribution to the changes in the inferred behavioral strategies with a linear model (Figure 6c): B = WA, where B is the matrix with behavior readout (BRW and WV indices) along the rows and ablation conditions along the columns, W is an unknown weight matrix from neuron to behavior, and A is the neural ablation matrix with binary element with ones on the off diagonal elements.Matrix A is full-rank and makes the linear model solvable.We normalized the rows in B to better compare weights across behaviors.
The majority of the weights W were positive (Figure 6c), indicating that a neuron contributes to a given navigation strategy for a given training and that, conversely, disrupting the neuron down-regulates the navigation strategy.Interestingly some neurons, such as RIA, had negative weights.Unlike wild-type animals which decrease their weathervaning upon aversive training, RIA-defective animals fail to adjust the extent of their weathervaning upon aversive training.This suggests that RIA may act in a learning-dependent manner to decrease or increase odor-dependence of head angle, consistent with RIA's known role in controlling head motion (39).
The patterns of neuron weights were similar for naive and appetitive conditions, which both corresponded to wild-type positive chemotaxis.The patterns of neuron weights for aversive behavior, however was quite distinct, suggesting that the neural mechanisms of aversive learning may be quite distinct from that of appetitive learning.
Interestingly, no neurons had non-zero weights exclusively on one strategy (e.g.biased random walk) but not the ei-ther (e.g.weathervaning), suggesting that even though these two strategies are mathematically and behaviorally distinct, they do not segregate cleanly into different sub-populations of neurons.Instead the neural basis for learning dependant weathervaning and biased random walks are both distributed across the neural population.

Discussion
We combined precise experimental measurements and a novel statistical model, dPAW, to characterize learned olfactory navigation in worms.The results show that (1) navigation strategies are bidirectionally altered by learning, (2) the dPAW model decodes the animal's past training conditions based on the observed navigation behavior and outperforms a classic chemotaxis metric, (3) behavior is more variable in naive worms and this variability can in part be explained by a context-dependent response to odor, and (4) interneurons downstream from AWC ON contribute to learned navigation strategies in a distributed manner.
To reach these conclusions, two experimental advances were critical.The first was the measurement of navigation in a precisely defined sensory environment enabled by our recently developed odor delivery system (27).Previous studies had analyzed navigation based on the proximity to a droplet source (3,12,31), but in those experiments the precise odor concentration experienced by the animal was typically not known.With the odor delivery system, we obtained the odor concentration experienced by the animal and therefore were able to fit more comprehensive models of sensorimotor

D R A F T
processing.
The second methodological advance is the development of a more robust training protocol to probe bivalent learning (aversive and appetitive) that shows clearer effects when tested in the same odor concentration range.Prior investigations into butanone learning probed different valence learning (appetitive vs. naive, or aversive vs. naive) in different concentration regimes (3,17,31,40), possibly because the behavior effects of aversive learning are known to be more pronounced at lower odor concentrations of butanone while appetitive is known to be more pronounced at higher concentrations of butanone (see for example  12) and results in (41)).But here we sought to directly compare detailed properties of learning, such as kernels, in the same sensory environment.To do so, we changed the training protocol (more repetitions during aversive than appetitive) in order to achieve a greater difference in learning outcomes.This produced large bivalent training-dependent changes to chemotaxis that were visible even in the same sensory environment (Figure 1).
The dPAW model fitted to measurements provides new insight into how the animal responds to time varying sensory stimuli.In particular, we find that learning alters the temporal kernels for odor input.For instance, appetitive training sharpens the tangential concentration kernel (Figure 2).By contrast, classical approaches have missed this important change because they implicitly have static sensing kernels and a priori fix the time window to calculated gradients (16,18,19,23).
Our measurement and model reveals that learning alters not only the sensing mechanism but also the statistics of behavioral noise (Figure 3), which is consistent with recent work showing that starvation and neuromodulation can dramatically alter the statistics of behavior (42,43).Our finding was possible only because dPAW explicitly includes parameters that control the noise level of the behavioral output, which is another advantage over past work which often excludes noise and relies on predetermined parameters (16,18,44).Future work is needed to pinpoint the source of behavioral stochasticity, such as noise in the sensory or motor circuits, and its potential functional roles in navigation and exploration (35,45).
A strength of our approach is that it allows us to learn properties of sensorimotor kernels from either artificial optogenetic stimulation or natural odor stimuli.Past work, by contrast, required choosing one approach or another (31,33,36).Optogenetic stimulation has advantages because it can be finely manipulated to deliver rich informative stimuli, for example white noise stimuli to study sensory encoding (46,47).But there are challenges to connect optogenetic stimuli to natural stimuli experienced during navigation.Our work shows directly that information from one approach is compatible to the other.We applied statistical inference directly to navigational trajectories in the presence of odor and qualitatively recover similar sensorimotor transformations to those we inferred using optogenetic perturbation (Figure 5).This is to our knowledge the first direct comparison between sensorimotor computation during navigation and response upon external perturbation.
An important conclusion from optogenetic perturbations is that the worm's response variability is altered by learning.We found that naive worms are more variable and that learning reduces variability in behavioral strategies across the population of worms.Surprisingly, we further found that the variation across optogenetic responses in naive worms can in part be ex-plained by the context of that worm's odor environment, up or down the gradient (Figure 5).Naive worms that travel up gradient pay more attention to both odor and optogenetic stimuli (Figure 5c, Fig. S8) compared to those that go down gradient.This is broadly consistent with prior literature describing how the result of associative learning can be heterogeneous across animals or non-stationary in time (48)(49)(50).Measurements in response to optogenetically perturbing AWC ON suggest that the source of such context-dependent behavior might be along the AWC ON pathway.Future work is needed to address the source of such variability in the nervous system, as well as its possible functional role for navigation and searching behavior (1,35).
Of the interneurons we investigated, all five classes contribute to both learned navigation strategies (Figure 6).Importantly, many connections from AWC ON to the first layer of interneurons are shared with salt sensing neurons ASER/L and thermal sensing neuron AFD (18,19,23).Some past work hypothesize that experience-dependent changes are localized in specific neurons--AIB has been shown to play a role in context-dependent thermal navigation (19).But other work presents evidence that learning effects can be distributed across many neurons (12,23).For instance, AIB, AIY, and AIZ all seem to be involved in learned salt chemotaxis strategies (23).Our findings agree with these results in showing that all three interneurons have non-zero and learning-dependent weights for both behavioral strategies.
One puzzling finding is that we sometimes notice changed behavioral strategies but unchanged chemotaxis index.For example, appetitive trained worms with defects in AIB have dramatic reductions in both weathervaning and biased random walk strategies but still show reasonable chemotaxis performance.This may be due to larger model mismatch in mutant worms with strategies deviating from dPAW.Large variability and less consistent behavioral strategies were also observed in earlier ablation studies (18).We explore this further in the supplement (SI Limitations and future work).
In this work we utilize a controlled odor environment and an innovative model to characterize learned odor navigation in worms.The combined approach of precisely delivered sensory stimuli, behavior quantification and navigational modeling continues to be a powerful approach (9,51,52) that can be generalized to other sensory modalities and species to study adaptive sensory navigation.By identifying the specific features of behavior that are altered by learning, our investigation lays the groundwork for followup neural imaging studies and will guide our search for neural representations of learning (53,54).

Materials and Methods
Worm strains and preparation.All worms were maintained at 20 C on nematode growth medium (NGM) agar plates seeded with E. coli (OP50).We used N2 bristol as wild type worms.A detailed strain list is provided in Supplementary Information (Table S1).Strain AML105 used for optogenetics experiments was integrated using strain CX14418 from (31) employing UV irradiation and was outcrossed six times before testing.AIB(-) strain is from (27) and AIA(-), AIY(-), AIZ(-) and RIA(-) strains are from (19).
Before chemotaxis experiments, we bleached and centrifuged batches of worms to synchronize the next generation as described in (55).L1 synchronized worms were plated to seeded NGM plates on the next day.Experiments were conducted 3 days after seeding,

D R A F T
which corresponds to synchronized 1-day-old adult worms.For optogenetic strains, L1 stage worms were plated onto 9 cm NGM agar plates seeded with 1ml OP50 food with 10 µl all-trans retinal (from 100 mM stock).For interneuron perturbation, miniSOG strains were treated with square wave pulses of 2.16 mW/mm 2 450 nm blue light at 1 Hz for 30 minutes at L1 stage and then allowed to recover before testing.
Olfactory learning protocol.The olfactory learning protocol was developed by modifying from previous literature (3,12,17,31).Synchronized young adult worms were removed from food and washed three times with S. Basal solution (11).Appetitive trained worms were suspended in 10 ml of S. Basal solution on a shaker to starve for 1 hour.After starvation, worms were placed on a 9 cm NGM agar plate with 1 ml of OP50 and 12 µl of pure butanone (2-Butanone, +99%, Extra Pure, Thermo Scientific) dropped on the interior of the lid and sealed with Parafilm.To hold butanone droplets in place, we placed 3 agar plugs on the lid and dropped 4 µl of butanone onto each plug.To conduct aversive training, worms were suspended in 10 ml of S. Basal with 1 µl butanone added.The tube was sealed and placed on the shaker for 1 hour.We found that aversive training protocol was most robust with repetition (Fig. S2), so we interleaved the session by plating the worms back on food for 30 minutes, then repeating the training for three times (Figure 1a).Worms were washed three times with S. Basal solution and centrifuged between each transfer during the training protocol.For naive condition, worms were directly removed from food and washed for three times before testing directly.
Odor delivery system and chemotaxis experiments.We used the odor flow delivery system and experimental protocols developed in (27) to measure chemotaxis trajectories after different training conditions.In short, this system incorporates controlled odorized airflow and continuously measures odor concentration along the boundary of an agar plate during animal experiments.As in (27), we calibrated the full array of metal-oxide based sensors with a downstream photo-ionization detector (Fig. S1a) to characterize the steady-state spatial profile (Fig. S1b).During animal experiments, we swapped out sensors in the middle of the arena to place in worms on an agar plate and continued to measure the boundary condition to confirm that the odor landscape is controlled and stable across the arena (Fig. S1c), as in (27) .
For chemotaxis experiments, we used 1.6 % agar with salt content matching S. Basal solution in a 10 cm square plate lid.We used 11 mM butanone dissolved in water as the odor reservoir.Moisturized clean air as the background flow.The background airflow is 400 ml/min and the butanone odor flow is 33-36 ml/min.After the pre-equilibration protocol (27) that brings the agar plate to steadystate in the odor environment, 50-100 worms were placed in the middle of agar plate and dried with kimwipes.Each chemotaxis sessions were record for 30 minutes.
Behavioral imaging and optical setup.Worm behavior imaging was performed as described in (27) .Briefly a CMOS camera measured worm behavior at 14 Hz.Worms were illuminated with 850 nm light.Images were captured with custom written Labview program and analyzed with Matlab scripts.An important difference from (27), is that here we added the ability to deliver optogenetic stimulation.Three 455 nm LEDs (M455L4, Thorlabs) were fixed on top of the flow chamber to deliver stimulation.The intensity was calibrated in the field of view with a photometer, with 85 µW /mm 2 for each uniform light impulse.Each pulse last for 5 seconds and were delivered every 30 seconds.Here we analyzed 1220-3060 pulses delivered to worms treated with retinal in each training conditions.
Behavioral analysis and dPAW inference.We tracked the location of the worm's centroid and fit a centerline to its posture as described in (27).We removed trajectories that are shorted than 1 minute or have displacement less than 3 mm across the recordings.In addition, trajectories that started above 70% of the maximum odor concentration were also removed to prevent double-counting worms that may have already started or traveled up-gradient.This results in 270-1,140 animal hours of chemotaxis trajectory per training condition for model fitting.For each processed navigation trajectory, we computed the displacement vectors every 5 time bins (5/14 seconds) and compute the angle between consecutive vectors to obtain dθt.We computed the odor concentration it passes through using the two-dimensional odor landscape measured in the flow chamber Ct.The perpendicular concentration difference dC ⊥ is calculated with unit vectors that are orthogonal to each displacement vector.We also recorded the length of each displacement vector to form the empirical speed distribution.
We fit dPAW to the ensemble of trajectories by maximizing the log-likelihood: argmaxΣ N n Σ T t log P (dθ n,t+1 |Cn,:t, dC ⊥ n,:t , dθn,:t) where N is the number of trajectories, T is the time steps, and λ is the regularization term.To impose smoothness on kernels, K C is parameterized with 4 raised-cosine basis function (46), K dC ⊥ and K h are parameterized with an exponential form.Optimization was performed with constrained optimization in Matlab, where the constrains are positivity of the precision parameters and sigmoid probabilities.Uncertainty about the inferred parameters are characterized by numerically computing the Hessian of the log-likelihood function around the maximum likelihood estimation.For modelbased decoding, we perform 7-fold cross validation with all measured trajectories.To test with finite data, we subsample 10 ensembles of trajectories to test for performance in Figure 4.
To validate the accuracy of maximum likelihood inference for dPAW model parameters, we simulated behavioral time series from dPAW with a fixed set of ground-truth parameters.The model generated simulated trajectories using Gaussian random white noise for concentration time series Ct and dC ⊥ t .The time series has 50,000 data points, which is in the same scale of our data length (∼ 300 animal hours of recording at 5/14 Hz sampling frequency).We confirmed that the inference procedure works with simulated data and recovers the ground truth parameters (Fig. S3).
To generate chemotaxis trajectories from inferred parameters shown in Figure 2, we conduct agent-based simulation by measuring concentration in the same odor landscape and drawing angular change dθt from dPAW.We simulate two-dimensional navigation trajectories with: x t+1 = xt + vt cos(θt) [7] y t+1 = yt + vt sin(θt) [8] θ t+1 = θt + dθt [9] where vt is speed drawn from Gaussian fit to the empirical distribution.
For information rate (Figure 3) and model comparison (Figure 4), we construct null models to compare with dPAW.The random walk model has similar statistical structure for behavior but is independent to odor concentration: P (dθt) = P (β = 1)Ppr(dθ) + P (β = 0)Prun(dθ) [10] Note that in this null model, the turning probability is timeindependent, so β does not have a time subscript t.The pirouette behavior has the similar sharp angle as the full model, but now in the null model the weathervaning is changed to "runs" that have zero-mean and do not take perpendicular concentration change into account.This model was fitted to the same ensemble of trajectories, and the log-likelihood difference between the full dPAW and this null model is normalized by log(2) per time to compute the bit rate.
We also used this model to compute behavior-only model prediction in figure 4. The chemotaxis model is formulated with a binomial distribution with expected fraction of track going up gradient p.The estimation for chemotaxis index is then 2p − 1. Lastly, the concentration change model takes C f inal − C initial for all tracks and uses naive Bayes classifier for prediction.
Statistical analysis for chemotaxis in mutant worms.We computed the concentration weighted chemotaxis index: (Nup∆Cup − N down ∆C down )/(Nup∆Cup+N down ∆C down ), where N is the number of tracks going up or down gradient and ∆C is computed for every track.We find the maximum and minimum concentration across the full observed odor landscape, then define ∆Cup = Cmax−C f inal for tracks going up gradient and ∆C down = C f inal −C min for tracks going down gradient, with ending concentration C f inal .
To conduct statistical tests between indices in Figure 6, we resampled chemotaxis trajectories in each conditions (Fig. S11).For

D R A F T
the weighted chemotaxis index, we sampled 50 tracks for 20 times to compute the standard deviation from mean.For the biased random walk strategy, we computed |K C |/std(Ct) to quantify the weights on concentration C and normalized by the standard deviation of the time series itself to compare across strains that experience different concentration inputs.For the weathervaning strategy, we computed std(K dC ⊥ * dC ⊥ :t ) to characterize how much the worm weights and corrects for the heading in response to dC ⊥ .To conduct statistical tests on the behavioral strategies, we sampled 100 times from the posteriors of these kernels, with Gaussian approximation around the maximum likelihood estimate shown in Figure 2 b,c.For each transgenic strain and training condition, we computed the standard deviation of the metrics and conducted t-test between the transgenic strain and wild type worms (Fig. S11).
Mapping the behavior-triggered average with optogenetics.For behavior triggered averages (Fig. S9) we delivered time varying LED light stimuli drawn from N (30, 30) (µW/mm 2 ) at 14Hz with 0.5s correlation time and bounded between 0-60 µW/mm 2 .This serves as a white noise stimulus.We computed the behavior triggered average (BTA) for reversals following methods applied to touch sensation in worms (55).The distribution of inter-turn-intervals observed from data, compared to trajectories simulated from the inferred dPAW with or without kernel K h .It has been reported that worms produce sharp turns in bouts with specific time scales, and the original definition of a pirouette included events defined by multiple sharp turns close together in time (2).The results shows that the dPAW model can generate tracks that have similar kinetics with the history kernel and without explicitly modeling state-transitions.  .Optogenetic kernel weights co-vary with odor kernel weights in naive but not learned conditions.To examine the relation between odor and optogenetic sensorimotor processing, we sub-sampled trajectories recorded from chemotaxis with optogenetic perturbation.For each subsampled navigation trajectories, we fit the statistical model for pirouette probability (equation 5) and computed the norm of kernels Kopto and K odor .We re-sampled ensemble of 100 trajectories from 300-1000 trajectories across three learning conditions.The scatter plots compare the norm of kernels Kopto and K odor inferred from sub-sampled datasets.The scattered samples show weak but significant correlation in naive condition.This means that subset of trajectories that vary and have less weight on odor signal would also have less optogenetic response.In contrast, aversive worm have clustered small weight on odor and is uncorrelated to the optogenetic input.Together, the sub-sampling result is consistent with the observation in Figure 5 and supports the finding that tracks going down gradient (less weight on odor input) also respond less to optogenetic input.Table S2.

Fig. 1 .
Fig. 1.Bidirectional olfactory learning in C. elegans.(a) Protocol for butanone associative training in worms.(b) After exposure to different training regimens in (a), worms' olfactory navigation is measured in a controlled odor environment.(c) Example trajectory after three different training conditions.(d) Summary statistics of learning across three conditions.The chemotaxis index is the normalized number of tracks going up gradient Nu versus down N d : Nu−N d Nu +N d , which corresponds to the classic chemotaxis

Fig. 2 .
Fig. 2. dPAW captures learned olfactory navigation in worms.(a) Schematic of the dPAW model.An example data trajectory is shown on the far left, providing time series of concentration and angle changes.Response kernels and decision functions are fitted to the time series data.The Bernoulli process β leads to two parallel strategies: biased random walk and weathervaning.(b) Chemotaxis index of simulated trajectories with inferred parameters.Error bar shows standard error of mean across 10 repeated simulation, each with 100 trajectories.(c) Example trajectories simulate from each training conditions.(d) Kernel K dC ⊥ and (e) Kernel K C fitted to three training conditions.

Fig. 3 .
Fig. 3. Learning alters pirouette decision and behavioral noise.(a) The top panel shows decision function P (β = 1|C, dθ) of three training conditions as a function of filtered signal: K C • C + K h • |dθ|.The distribution of the filtered signal is shown in the bottom panel.The right panel shows the distribution of the output pirouette rate, with inset showing the same distribution in log scale.(b) The precision parameter for weathervaning, κwv across three training conditions.(c) The information rate given fitted model parameters and data across three training conditions.Error bar shows standard error of mean across 10 sampled batches of trajectories.

Fig. 4 .
Fig. 4. Model-based decoding of learning.(a) Model performance classifying the prior training of a population of worms (aversive, appetitive or naive) as a function of mean data length.Four models are compared and error bars show standard error of mean across 7-fold cross validation and 10-fold sampling across data length.w/o concentration: model that only captures behavioral statistics and does not account for odor input.w/o behavior: model for concentration difference along each trajectory and does not account for behavioral output.Chance level, 33%, is shown in grey dash line.(b) dPAW-based decoding as a function of data length as in (a), but here performance for each training condition is reported separately.

Fig. 5 .
Fig. 5. Response to optogenetic induced odor sensation is altered by learning and depends on odor gradient context.(a) Optogenetic stimulation is delivered to animals expressing ChR2 in AWC ON as they crawl in an odor-flow chamber.(b) Change in the absolute heading |dθ| upon optogenetic impulse across the three training conditions.Change is computed between the average response during the 5s impulse and 5s before it.We record from 4-7 plates per condition.Error bar shows standard error of mean across over 1000 impulses per condition.The results of t-test comparison between training conditions are shown with * * * for p < 0.001.(c) Same measurements as (b) upon optogenetic impulse, but with the context of gradient direction.Within each training conditions, measurements are separated into trajectories going up or down gradient.The results of t-test comparison between context are shown with * * * for p < 0.001 and NS for non-significance.(d) Time-varying average angular speed aligned to the optogenetic stimuli.Three panels show different training conditions.Solid line and dash line indicate up and sown gradient trajectories.Gray shading shows standard error of mean across traces.(e) Temporal kernels for odor K C and optogenetic Kopto fitted to each training conditions.The grey area shows standard deviation around the estimated kernels.

Fig. 6 .
Fig. 6.Learned odor navigation in worms with disrupted interneurons.(a) Five interneurons connected to the AWC sensory neuron.Schematic generated from nemanode.org.(b) Weighted chemotaxis index (wCI), biased random walk (BRW), and weathervaning (WV) indices computed from the inferred dPAW for all mutants across three training conditions.BRW and WV correspond to the norm of kernels K C and K dC ⊥ in dPAW.The circles show value from wild-type N2 worms and arrow shows the modulation in mutant worms.We record 3-5 plates for each stain and condition, resulting in 200-500 tracks in each measurement.(c) Weights in the neuron-behavior matrix W across three training conditions.The two rows are for different behavioral strategies and the columns are for different interneurons.

Fig. S1 .Fig. S3 .Fig. S4 .Fig. S5 .
Fig. S1.Odor landscape experienced by the animal is known, stable across space and time, and calibrated against a photoionization detector.(a) An array of metal-oxide sensors (MOS) is used to monitor odor concentration.The metal-oxide sensor is calibrated against a a downstream photoionization detector (PID) to provide parts per million of butanone.(4).(b) Odor gradient experienced by the animal in a typical experiment, as measured by the full array of sensors indicated in (black circles).Inferred odor concentration via interpolation is shown .(c) The odor concentration profile along the boundary is stable across the duration of the experiment.Readout from a row of boundary sensors downstream from the odor flow path during animal experiments.
Fig. S6.Variability of the inferred kernels K C across 5 datasets subsampled (without replacement) from the full dataset, for each of the three training conditions.The inferred kernels K C for each subsample are shown in different color coded lines.The estimated kernels were highly consistent in appetitive condition, but more variable in naive and aversive conditions.

Fig. S7 .
Fig. S7.Histogram of the odor concentration change ∆C along all trajectories observed after three different training conditions.The aversive condition has a qualitative shift away from appetitive and naive conditions.However, as quantified in Figure4a, concentration alone has lower prediction power compared to the full dPAW model.

Fig. S9 .
Fig. S9.Optogenetic kernels extracted from white-noise optogenetic stimulation measurements.Changes to these optogenetic kernels mimic changes to the odor kernels that were observed in odor landscape.(a) Behavioral triggered average (BTA) for reversal computed from Gaussian white noise stimuli after three training conditions.The shaded area shows standard error of mean around the mean intensity (µW/mm 2 ).(b) Comparing BTA from (a) after baseline subtraction for three training conditions.

Fig. S10 .Fig. S11 .
Fig. S10.Disruptions to different interneurons result in qualitatively different locomotion trajectories.Trajectories form worms after appetitive training for disruptions to four different classes of interneurons are shown.Green dots and red dots indicate the starting point and ending point of each track, respectively.